-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24891][SQL] Fix HandleNullInputsForUDF rule #21851
Conversation
// branch of `If` will be called if any of these checked inputs is null. Thus we can | ||
// prevent this rule from being applied repeatedly. | ||
val newInputs = parameterTypes.zip(inputs).map{ case (cls, expr) => | ||
if (needsNullCheck(cls, expr)) AssertNotNull(expr) else expr } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us introduce KnownNotNull
instead of using AssertNotNull
, which has a side-effect?
Test build #93459 has finished for PR 21851 at commit
|
Test build #93469 has finished for PR 21851 at commit
|
retest this please |
update the PR description? |
Test build #93503 has finished for PR 21851 at commit
|
retest this please |
Test build #93514 has finished for PR 21851 at commit
|
LGTM Thanks! Merged to master/2.3 |
The HandleNullInputsForUDF would always add a new `If` node every time it is applied. That would cause a difference between the same plan being analyzed once and being analyzed twice (or more), thus raising issues like plan not matched in the cache manager. The solution is to mark the arguments as null-checked, which is to add a "KnownNotNull" node above those arguments, when adding the UDF under an `If` node, because clearly the UDF will not be called when any of those arguments is null. Add new tests under sql/UDFSuite and AnalysisSuite. Author: maryannxue <[email protected]> Closes #21851 from maryannxue/spark-24891.
What changes were proposed in this pull request?
The HandleNullInputsForUDF would always add a new
If
node every time it is applied. That would cause a difference between the same plan being analyzed once and being analyzed twice (or more), thus raising issues like plan not matched in the cache manager. The solution is to mark the arguments as null-checked, which is to add a "KnownNotNull" node above those arguments, when adding the UDF under anIf
node, because clearly the UDF will not be called when any of those arguments is null.How was this patch tested?
Add new tests under sql/UDFSuite and AnalysisSuite.